Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A Restoration and Segmentation Unit for the Historic Persian Documents

Identifieur interne : 001352 ( Main/Exploration ); précédent : 001351; suivant : 001353

A Restoration and Segmentation Unit for the Historic Persian Documents

Auteurs : Shahpour Alirezaee [Iran] ; Shayesteh Fard [Iran] ; Hassan Aghaeinia [Iran] ; Karim Faez [Iran]

Source :

RBID : ISTEX:25A5B5A5BE3B1263DAEBBEBDAB6E9BA6A67855BB

Descripteurs français

English descriptors

Abstract

Abstract: This paper aims to provide a document restoration and segmentation algorithm for the Historic Middle Persian or Pahlavi manuscripts. The proposed algorithm uses the mathematical morphology and connected component concept to segment the line, word, and character overlapped in the Middle-age Persian documents in preparation for OCR application. To evaluate the performance of the restoration algorithm, 200 pages of the Pahlavi documents are used as experimental data in our test. Numerical results indicate that the proposed algorithm can remove the noise and destructive effects. The results also show 99.14% accuracy on the baseline detection, 97.35% accuracy on the text line extraction and removing other lines overlaps, and 99.5% accuracy for segmenting the extracted text lines to their components.

Url:
DOI: 10.1007/11558484_85


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A Restoration and Segmentation Unit for the Historic Persian Documents</title>
<author>
<name sortKey="Alirezaee, Shahpour" sort="Alirezaee, Shahpour" uniqKey="Alirezaee S" first="Shahpour" last="Alirezaee">Shahpour Alirezaee</name>
</author>
<author>
<name sortKey="Fard, Shayesteh" sort="Fard, Shayesteh" uniqKey="Fard S" first="Shayesteh" last="Fard">Shayesteh Fard</name>
</author>
<author>
<name sortKey="Aghaeinia, Hassan" sort="Aghaeinia, Hassan" uniqKey="Aghaeinia H" first="Hassan" last="Aghaeinia">Hassan Aghaeinia</name>
</author>
<author>
<name sortKey="Faez, Karim" sort="Faez, Karim" uniqKey="Faez K" first="Karim" last="Faez">Karim Faez</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:25A5B5A5BE3B1263DAEBBEBDAB6E9BA6A67855BB</idno>
<date when="2005" year="2005">2005</date>
<idno type="doi">10.1007/11558484_85</idno>
<idno type="url">https://api.istex.fr/document/25A5B5A5BE3B1263DAEBBEBDAB6E9BA6A67855BB/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001883</idno>
<idno type="wicri:Area/Istex/Curation">001785</idno>
<idno type="wicri:Area/Istex/Checkpoint">000C61</idno>
<idno type="wicri:doubleKey">0302-9743:2005:Alirezaee S:a:restoration:and</idno>
<idno type="wicri:Area/Main/Merge">001388</idno>
<idno type="wicri:source">INIST</idno>
<idno type="RBID">Pascal:06-0001215</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000425</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000362</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000439</idno>
<idno type="wicri:doubleKey">0302-9743:2005:Alirezaee S:a:restoration:and</idno>
<idno type="wicri:Area/Main/Merge">001469</idno>
<idno type="wicri:Area/Main/Curation">001352</idno>
<idno type="wicri:Area/Main/Exploration">001352</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">A Restoration and Segmentation Unit for the Historic Persian Documents</title>
<author>
<name sortKey="Alirezaee, Shahpour" sort="Alirezaee, Shahpour" uniqKey="Alirezaee S" first="Shahpour" last="Alirezaee">Shahpour Alirezaee</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Iran</country>
<wicri:regionArea>Electrical Engineering Department, Islamic Azad University of Abhar, Abhar</wicri:regionArea>
<wicri:noRegion>Abhar</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Iran</country>
</affiliation>
</author>
<author>
<name sortKey="Fard, Shayesteh" sort="Fard, Shayesteh" uniqKey="Fard S" first="Shayesteh" last="Fard">Shayesteh Fard</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Iran</country>
<wicri:regionArea>Electrical Engineering Department, Zanjan University, Zanjan</wicri:regionArea>
<wicri:noRegion>Zanjan</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Iran</country>
</affiliation>
</author>
<author>
<name sortKey="Aghaeinia, Hassan" sort="Aghaeinia, Hassan" uniqKey="Aghaeinia H" first="Hassan" last="Aghaeinia">Hassan Aghaeinia</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Iran</country>
<wicri:regionArea>Electrical Engineering Department, Amirkabir University of Technology, Hafez Ave., Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Iran</country>
</affiliation>
</author>
<author>
<name sortKey="Faez, Karim" sort="Faez, Karim" uniqKey="Faez K" first="Karim" last="Faez">Karim Faez</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Iran</country>
<wicri:regionArea>Electrical Engineering Department, Amirkabir University of Technology, Hafez Ave., Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Iran</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2005</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">25A5B5A5BE3B1263DAEBBEBDAB6E9BA6A67855BB</idno>
<idno type="DOI">10.1007/11558484_85</idno>
<idno type="ChapterID">85</idno>
<idno type="ChapterID">Chap85</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Character recognition</term>
<term>Document analysis</term>
<term>Document processing</term>
<term>Iran</term>
<term>Mathematical morphology</term>
<term>Optical character recognition</term>
<term>Pattern extraction</term>
<term>Text</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Analyse documentaire</term>
<term>Extraction forme</term>
<term>Iran</term>
<term>Langage iranien</term>
<term>Morphologie mathématique</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Restauration document perse</term>
<term>Segmentation document perse</term>
<term>Texte</term>
<term>Traitement document</term>
</keywords>
<keywords scheme="Wicri" type="geographic" xml:lang="fr">
<term>Iran</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: This paper aims to provide a document restoration and segmentation algorithm for the Historic Middle Persian or Pahlavi manuscripts. The proposed algorithm uses the mathematical morphology and connected component concept to segment the line, word, and character overlapped in the Middle-age Persian documents in preparation for OCR application. To evaluate the performance of the restoration algorithm, 200 pages of the Pahlavi documents are used as experimental data in our test. Numerical results indicate that the proposed algorithm can remove the noise and destructive effects. The results also show 99.14% accuracy on the baseline detection, 97.35% accuracy on the text line extraction and removing other lines overlaps, and 99.5% accuracy for segmenting the extracted text lines to their components.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Iran</li>
</country>
</list>
<tree>
<country name="Iran">
<noRegion>
<name sortKey="Alirezaee, Shahpour" sort="Alirezaee, Shahpour" uniqKey="Alirezaee S" first="Shahpour" last="Alirezaee">Shahpour Alirezaee</name>
</noRegion>
<name sortKey="Aghaeinia, Hassan" sort="Aghaeinia, Hassan" uniqKey="Aghaeinia H" first="Hassan" last="Aghaeinia">Hassan Aghaeinia</name>
<name sortKey="Aghaeinia, Hassan" sort="Aghaeinia, Hassan" uniqKey="Aghaeinia H" first="Hassan" last="Aghaeinia">Hassan Aghaeinia</name>
<name sortKey="Alirezaee, Shahpour" sort="Alirezaee, Shahpour" uniqKey="Alirezaee S" first="Shahpour" last="Alirezaee">Shahpour Alirezaee</name>
<name sortKey="Faez, Karim" sort="Faez, Karim" uniqKey="Faez K" first="Karim" last="Faez">Karim Faez</name>
<name sortKey="Faez, Karim" sort="Faez, Karim" uniqKey="Faez K" first="Karim" last="Faez">Karim Faez</name>
<name sortKey="Fard, Shayesteh" sort="Fard, Shayesteh" uniqKey="Fard S" first="Shayesteh" last="Fard">Shayesteh Fard</name>
<name sortKey="Fard, Shayesteh" sort="Fard, Shayesteh" uniqKey="Fard S" first="Shayesteh" last="Fard">Shayesteh Fard</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001352 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001352 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:25A5B5A5BE3B1263DAEBBEBDAB6E9BA6A67855BB
   |texte=   A Restoration and Segmentation Unit for the Historic Persian Documents
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024